Re: qsort again (was Re: [PERFORM] Strange Create Index - Mailing list pgsql-hackers

From Jens-Wolfhard Schicke
Subject Re: qsort again (was Re: [PERFORM] Strange Create Index
Date
Msg-id 97E2D852E73713354532B531@[192.168.1.72]
Whole thread Raw
In response to Re: qsort again (was Re: [PERFORM] Strange Create Index  ("Dann Corbit" <DCorbit@connx.com>)
Responses Re: qsort again (was Re: [PERFORM] Strange Create Index  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
--On Donnerstag, Februar 16, 2006 10:39:45 -0800 Dann Corbit
<DCorbit@connx.com> wrote:

> He refers to counting sort and radix sort (which comes in most
> significant digit and least significant digit format).  These are also
> called distribution (as opposed to comparison) sorts.
>
> These sorts are O(n) as a function of the data size, but really they are
> O(M*n) where M is the average key length and n is the data set size.
> (In the case of MSD radix sort M is the average length to completely
> differentiate strings)
>
> So if you have an 80 character key, then 80*log(n) will only be faster
I suppose you meant 80 * n here?

> than n*log(n) when you have 2^80th elements -- in other words -- never.
I think this is wrong. You can easily improve Radix sort by a constant if
you don't take single bytes as the digits but rather k-byte values. At
least 2 byte should be possible without problems. This would give you 40 *
n time, not 80 * n. And you assume that the comparision of an 80-byte wide
value only costs 1, which might (and in many cases will be imho) wrong.
Actually it migh mean to compare 80 bytes as well, but I might be wrong.

What I think as the biggest problem is the digit representation necessary
for Radix-Sort in cases of locales which sort without looking at spaces. I
assume that would be hard to implement. The same goes for the proposed
mapping of string values onto 4/8-byte values.

Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke              j.schicke@asco.de
asco GmbH              http://www.asco.de
Mittelweg 7              Tel 0531/3906-127
38106 Braunschweig          Fax 0531/3906-400


pgsql-hackers by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: In Japan with Josh Berkus
Next
From: Martijn van Oosterhout
Date:
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index